Generalization to New Compositions of Known Entities in Image Understanding

نویسنده

  • Yuval Atzmon
چکیده

Recurrent neural networks can be trained to describe images with natural language, but it has been observed that they generalize poorly to new scenes at test time. Here we provide an experimental framework to quantify their generalization to unseen compositions. By describing images using short structured representations, we tease apart and evaluate separately two types of generalization: (1) generalization to new images of similar scenes, and (2) generalization to unseen compositions of known entities. We quantify these two types of generalization by a large-scale experiment on the MS-COCO dataset with a state-of-the-art recurrent network, and compare to a baseline structured prediction model on top of a deep network. We find that a state-of-the-art image captioning approach is largely “blind” to new combinations of known entities (∼2.3% precision@1), and achieves statistically similar precision@1 to that of a considerably simpler structured-prediction model with much smaller capacity. We therefore advocate using compositional generalization metrics to evaluate vision and language models, since generalizing to new combinations of known entities is key for understanding complex real data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to generalize to new compositions in image understanding

Recurrent neural networks have recently been used for learning to describe images using natural language. However, it has been observed that these models generalize poorly to scenes that were not observed during training, possibly depending too strongly on the statistics of the text in the training data. Here we propose to describe images using short structured representations, aiming to captur...

متن کامل

16th century Persian tiles in dialogue with 21st century digital tiles in the Sadrian universe

This article brings together tiles of 16th century Persian architecture and 21st century digital tiles of moving image to explore new potentials beyond the perceived image. As minimal parts of a bigger image, they both appear still and motionless. However, Persian Islamic philosopher, Mulla Sadrā Shirazi’s (1571-1640) theory of ‘substantial motion’ (al-harakat a...

متن کامل

The Study of Image by Aby Warburg

Warburg is one of the most inspiring figures in the German tradition of art historiography. Although one of the well-known aspects of Warburg’s intellectual work is its relation to iconology, this article seeks to go beyond the conventional understanding of Warburg to mapping a picture of his “special” view of the image as a whole. To this end, by re-reading the classic Warburg texts and new in...

متن کامل

Simultaneous generalizations of known fixed point theorems for a Meir-Keeler type condition with applications

In this paper, we first establish a new fixed point theorem for a Meir-Keeler type condition. As an application, we derive a simultaneous generalization of Banach contraction principle, Kannan's fixed point theorem, Chatterjea's fixed point theorem and other fixed point theorems. Some new fixed point theorems are also obtained.

متن کامل

On a generalization of condition (PWP)

‎There is a flatness property of acts over monoids called Condition $(PWP)$ which‎, ‎so far‎, ‎has received‎ ‎much attention‎. ‎In this paper‎, ‎we introduce Condition GP-$(P)$‎, ‎which is a generalization of Condition $(PWP)$‎. ‎Firstly‎, ‎some  characterizations of monoids by Condition GP-$(P)$ of their‎ ‎(cyclic‎, ‎Rees factor) acts are given‎, ‎and many known results are generalized‎. ‎More...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017